skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Stone, Gregory"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Testing is a part of education around the world; however, there are concerns that consequences of testing is underexplored within current educational scholarship. Moreover, usability studies are rare within education. One aim of the present study was to explore the usability of a mathematics problem-solving test called the Problem Solving Measures–Computer-Adaptive Test (PSM-CAT) designed for grades six to eight students (ages 11–14). The second aim of this mixed-methods research was to unpack consequences of testing validity evidence related to the results and test interpretations, leveraging the voices of participants. A purposeful, representative sample of over 1000 students from rural, suburban, and urban districts across the USA were administered PSM-CAT followed by a survey. Approximately 100 of those students were interviewed following test administration. Findings indicated that (1) participants engaged in the PSM-CAT as desired and found it highly usable (e.g., most respondents were able to use and find the calculator and several students commented that they engaged with the test as desired) and (2) the benefits from testing largely outweighed any negative outcomes (e.g., 92% of students interviewed had positive attitudes towards the testing experiences), which in turn supports consequences from testing validity evidence for PSM-CAT. This study provides an example of a usability study for educational testing and builds upon previous calls for greater consequences of testing research. 
    more » « less
    Free, publicly-accessible full text available June 1, 2026
  2. This study explored how mathematics problem-solving constructed-response tests compared in terms of item psychometrics when administered to eighth grade students in two different static formats: paper-pencil and computer-based. Quantitative results indicated similarly across all psychometric indices for the overall tests and at the item-level. 
    more » « less
    Free, publicly-accessible full text available March 8, 2026
  3. Kosko, Karl W; Caniglia, Joanne; Courtney, Scott A; Zolfaghari, Maryam; Morris, Grace A (Ed.)
    Free, publicly-accessible full text available November 10, 2025
  4. Kosko, Karl W; Caniglia, Joanne; Courtney, Scott A; Zolfaghari, Maryam; Morris, Grace A (Ed.)
    Free, publicly-accessible full text available November 10, 2025
  5. Kombe, Dennis; Wheeler, Ann (Ed.)
    The purpose of this proceeding is to share a component to a validity argument for a new, computer adaptive mathematics Problem-Solving Measure that is designed for grades six through eight (PSM 6-8). The PSM is a single test, which uses computer adaptive features to measure students’ performance using instructional standards. It is intended to measure students’ problem-solving performance related to instructional standards. 
    more » « less
  6. Smith, Richard (Ed.)
    Lengthy standardized assessments decrease instructional time while increasing concerns about student cognitive fatigue. This study presents a methodological approach for item reduction within a complex assessment setting using the Problem Solving Measure for Grade 6 (PSM6). Five item-reduction methods were utilized to reduce the number of items on the PSM6, and each shortened instrument was evaluated through validity evidence for test content, internal structure, and relationships to other variables. The two quantitative methods (Rasch model and point-biserial) resulted in the best psychometrically performing shortened assessments but were not representative of all content subdomains, while the three qualitative (content preservation) methods resulted in poor psychometrically performing assessments that retained all subdomains. Specifically, the ten-item Rasch and ten-item point-biserial shortened tests demonstrated the overall strongest validity evidence, but future research is needed to explore the psychometric performance of these versions in a new independent sample and the necessity for subdomain representation. Implications for the study provide a methodological framework for researchers to use and reduce the length of existing instruments while identifying how the various reduction strategies may sacrifice different information from the original instrument. Practitioners are encouraged to carefully examine to what extent their reduced instrument aligns with their pre-determined criteria. 
    more » « less
  7. Lamberg, Teruni; Moss, Diana (Ed.)
    Depth-of-knowledge (DOK) is a means to communicate the cognitive demand of tasks and is often used to categorize assessment items. Webb’s (2002) framework has been applied across content areas. The aim of this two-phase iterative study was to modify Webb’s DOK framework for word problems. Through work with school partners, this iterative design-research based study provides supportive evidence for a modified DOK framework reflecting levels of complexity in word problems. The resulting modified DOK framework presents an opportunity for mathematics educators to reflect on various aspects of cognitive complexity. 
    more » « less
  8. Abstract Determining the most appropriate method of scoring an assessment is based on multiple factors, including the intended use of results, the assessment's purpose, and time constraints. Both the dichotomous and partial credit models have their advantages, yet direct comparisons of assessment outcomes from each method are not typical with constructed response items. The present study compared the impact of both scoring methods on the internal structure and consequential validity of a middle‐grades problem‐solving assessment called the problem solving measure for grade six (PSM6). After being scored both ways, Rasch dichotomous and partial credit analyses indicated similarly strong psychometric findings across models. Student outcome measures on the PSM6, scored both dichotomously and with partial credit, demonstrated strong, positive, significant correlation. Similar demographic patterns were noted regardless of scoring method. Both scoring methods produced similar results, suggesting that either would be appropriate to use with the PSM6. 
    more » « less